Siemens: Funtional Data Analysis Pipeline¶

Index

  1. Loading the datasets
  2. Preprocessing steps
    • 2.1. Data wrangling on time series
    • 2.2. Data wrangling on additional features
    • 2.3. Merging time series datasets to add additional features
      • 2.3.1. Removal of testID only exists in one sensor
  3. Window extraction
    • 3.1. Validating if there are partial or full missing values after the extraction
    • 3.2. Validating shape post-window extraction
    • 3.3. Merging scaled data with additional attributes of interest
    • 3.4. Balancing the specific attributes
    • 3.5. Windows visualization (balanced data)
  4. FPCA characterization
    • 4.1. Functional PC1 plots (both systems): Characterization of FPC Scores
    • 4.2 Linear Regression for slope
  5. Functional Regression
    • 5.1. Regression coefficients
    • 5.2. Coefficients visualization
In [1]:
#!pip install scikit-fda
import os
os.chdir("..")
In [2]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import altair as alt
import random
import statsmodels.api as sm
from skfda.representation.grid import FDataGrid
from skfda.preprocessing.dim_reduction.projection import FPCA
from skfda.exploratory.visualization import FPCAPlot
from sklearn.preprocessing import OneHotEncoder
import skfda
from skfda.ml.regression import LinearRegression
from skfda.representation.basis import FDataBasis, FourierBasis
from skfda.exploratory.depth import IntegratedDepth, ModifiedBandDepth
from skfda.exploratory.visualization import Boxplot
# Import designed-functions
from window_extraction import calculate_window_values, calculate_window_data, Merge_data, align_to_zero, balance_index
from time_series_visualization import plot_all_time_series, plot_all_time_series_and_mean_fpca, plot_all_time_series_in_group
from functionalPCA import fpca_two_inputs, first_component_extraction, bootstrap, create_pc_scores_plots, visualize_regression
from functional_regression import Function_regression, coefficent_visualization
/var/folders/_c/wdm33bq11dvflh73ffxrd8z40000gn/T/ipykernel_9895/3302358765.py:9: DeprecationWarning: The module "projection" is deprecated. Please use "dim_reduction"
  from skfda.preprocessing.dim_reduction.projection import FPCA

1. Loading the datasets¶

The path of the files can be change based on where the data is stored.

In [3]:
# Import datasets
sensorA_System1 = pd.read_csv("RawData/System1_SensorA.csv")
sensorA_System2 = pd.read_csv("RawData/System2_SensorA.csv")
sensorB_System1 = pd.read_csv("RawData/System1_SensorB.csv")
sensorB_System2 = pd.read_csv("RawData/System2_SensorB.csv")
sensorA_System1_missing = pd.read_csv("RawData/SensorA_System1_missing values.csv")
sensorA_System2_missing = pd.read_csv("RawData/SensorA_System2_missing values.csv")
keyByTestID = pd.read_csv("RawData/Key by TestID.csv", parse_dates=['DateTime'])

2. Preprocesing Steps¶

2.1. Data wrangling on time series¶

In [4]:
# Transpose dataset to make columns as timestamps and rows as tests

# Sensor A
A1_transposed = sensorA_System1.T.reset_index()
A1_transposed.columns = A1_transposed.iloc[0]
A1_transposed.rename(columns={A1_transposed.columns[0]: 'TestID'}, inplace=True)
A1_transposed = A1_transposed.drop(0)
A1_transposed['TestID'] = A1_transposed['TestID'].astype(int)

A2_transposed = sensorA_System2.T.reset_index()
A2_transposed.columns = A2_transposed.iloc[0]
A2_transposed.rename(columns={A2_transposed.columns[0]: 'TestID'}, inplace=True)
A2_transposed = A2_transposed.drop(0)
A2_transposed['TestID'] = A2_transposed['TestID'].astype(int)

A1_missing_transposed = sensorA_System1_missing.T.reset_index()
A1_missing_transposed.columns = A1_missing_transposed.iloc[0]
A1_missing_transposed.rename(columns={A1_missing_transposed.columns[0]: 'TestID'}, inplace=True)
A1_missing_transposed = A1_missing_transposed.drop(0)
A1_missing_transposed['TestID'] = A1_missing_transposed['TestID'].astype(int)

A2_missing_transposed = sensorA_System2_missing.T.reset_index()
A2_missing_transposed.columns = A2_missing_transposed.iloc[0]
A2_missing_transposed.rename(columns={A2_missing_transposed.columns[0]: 'TestID'}, inplace=True)
A2_missing_transposed = A2_missing_transposed.drop(0)
A2_missing_transposed['TestID'] = A2_missing_transposed['TestID'].astype(int)

# Sensor B
B1_transposed = sensorB_System1.T.reset_index()
B1_transposed.columns = B1_transposed.iloc[0]
B1_transposed.rename(columns={B1_transposed.columns[0]: 'TestID'}, inplace=True)
B1_transposed = B1_transposed.drop(0)
B1_transposed['TestID'] = B1_transposed['TestID'].astype(int)

B2_transposed = sensorB_System2.T.reset_index()
B2_transposed.columns = B2_transposed.iloc[0]
B2_transposed.rename(columns={B2_transposed.columns[0]: 'TestID'}, inplace=True)
B2_transposed = B2_transposed.drop(0)
B2_transposed['TestID'] = B2_transposed['TestID'].astype(int)
In [5]:
# Complete A1 and A2 with the missing values
A1_transposed_mid = A1_transposed[~A1_transposed.TestID.isin(A1_missing_transposed.TestID)]
A1_transposed = pd.concat([A1_transposed_mid, A1_missing_transposed], axis=0)
A2_transposed_mid = A2_transposed[~A2_transposed.TestID.isin(A2_missing_transposed.TestID)]
A2_transposed = pd.concat([A2_transposed_mid, A2_missing_transposed], axis=0)

2.2. Data wrangling on additional features¶

In [6]:
# Relabeling System Values
keyByTestID["System"] = keyByTestID["System"].replace({"System 2A":"System 2","System 2B":"System 2"})

# Create new column to fill fluid temperature NA's
# Note: Fluid temperature: If specified, take as the temperature of the sample fluid. The rest of the system temperature can be taken as ambient temperature.
keyByTestID['Fluid_Temperature_Filled'] = keyByTestID['Fluid Temperature'].combine_first(keyByTestID['AmbientTemperature'])

# Binning 

# Categorize 'FluidType' into Blood and Aqueous
keyByTestID['FluidTypeBin'] = np.where(keyByTestID['FluidType'].str.startswith('Eurotrol'), 'Aqueous', 'Blood')

# Categorize 'AgeOfCardInDaysAtTimeOfTest' into bins
keyByTestID["CardAgeBin"] = pd.cut(keyByTestID["AgeOfCardInDaysAtTimeOfTest"], bins=[0, 9, 28, 56, 84, 112, 140, 168, 196, 224, 252],
                                   labels=['[0-9]', '(9-28]', '(28-56]', '(56-84]', '(84-112]', '(112-140]', '(140-168]', '(168-196]', '(196-224]', '(224-252]'])


# Categorize 'Fluid_Temperature_Filled' into bins
keyByTestID["FluidTempBin"] = pd.cut(keyByTestID["Fluid_Temperature_Filled"], bins=[-1, 20, 25, 100], labels=['Below 20', '20-25', 'Above 25'])

# Filtering successful tests
keyByTestID = keyByTestID[keyByTestID['ReturnCode'].isin(['Success','UnderReportableRange'])]

2.3. Merging time series datasets to add additional features¶

In [7]:
# Merge dataset with keyByTestID and delete unmatched tests
keyByTestID['TestID'] = keyByTestID['TestID'].astype(int)
keyByTestID['System'] = keyByTestID['System'].astype(str)

A1_keyByTestID = keyByTestID[(keyByTestID['Sensor'] == 'Sensor A') & (keyByTestID['System'] == 'System 1')]
A1_Merged = pd.merge(A1_keyByTestID,A1_transposed,how='inner', on=['TestID'])
A1_transposed = A1_transposed[A1_transposed['TestID'].isin(A1_Merged['TestID'])]

A2_keyByTestID = keyByTestID.loc[(keyByTestID['Sensor'] == 'Sensor A') & (keyByTestID['System'] != 'System 1')]
A2_Merged = pd.merge(A2_keyByTestID,A2_transposed,how='inner', on=['TestID'])
A2_transposed = A2_transposed[A2_transposed['TestID'].isin(A2_Merged['TestID'])]

sensorA_System1 = sensorA_System1.loc[:, sensorA_System1.columns.isin(A1_Merged['TestID'].astype(str))]
sensorA_System2 = sensorA_System2.loc[:, sensorA_System2.columns.isin(A2_Merged['TestID'].astype(str))]


B1_keyByTestID = keyByTestID[(keyByTestID['Sensor'] == 'Sensor B') & (keyByTestID['System'] == 'System 1')]
B1_Merged = pd.merge(B1_keyByTestID,B1_transposed,how='inner', on=['TestID'])
B1_transposed = B1_transposed[B1_transposed['TestID'].isin(B1_Merged['TestID'])]

B2_keyByTestID = keyByTestID.loc[(keyByTestID['Sensor'] == 'Sensor B') & (keyByTestID['System'] != 'System 1')]
B2_Merged = pd.merge(B2_keyByTestID,B2_transposed,how='inner', on=['TestID'])
B1_transposed = B2_transposed[B2_transposed['TestID'].isin(A2_Merged['TestID'])]

sensorB_System1 = sensorB_System1.loc[:, sensorB_System1.columns.isin(B1_Merged['TestID'].astype(str))]
sensorB_System2 = sensorB_System2.loc[:, sensorB_System2.columns.isin(B2_Merged['TestID'].astype(str))]

print('A1: ', A1_Merged.shape)
print('A2: ', A2_Merged.shape)
print('B1: ', B1_Merged.shape)
print('B2: ', B2_Merged.shape)
A1:  (3382, 3380)
A2:  (7743, 3371)
B1:  (3375, 3380)
B2:  (7745, 3371)

2.3.1. Removal of testID only exists in one sensor¶

In [8]:
# Note: Only run once. If not, restart the kernel and run from the beggining again.
A1_Merged = A1_Merged[A1_Merged["TestID"].isin(B1_Merged["TestID"])]
B1_Merged = B1_Merged[B1_Merged["TestID"].isin(A1_Merged["TestID"])]

A2_Merged = A2_Merged[A2_Merged["TestID"].isin(B2_Merged["TestID"])]
B2_Merged = B2_Merged[B2_Merged["TestID"].isin(A2_Merged["TestID"])]
print('A1: ', A1_Merged.shape)
print('A2: ', A2_Merged.shape)
print('B1: ', B1_Merged.shape)
print('B2: ', B2_Merged.shape)
A1:  (3374, 3380)
A2:  (7743, 3371)
B1:  (3374, 3380)
B2:  (7743, 3371)

3. Window extraction¶

In [9]:
# Match window values of Sensor A for each test
calDelimit = 11
cal_window_size = 8
sampleDelimit = 15
sample_window_size = 5

# Sensor A
cal_window_start, cal_window_end, sample_window_start, sample_window_end = calculate_window_values(bubble_start=A1_Merged['BubbleDetectTime'],
                                                                                                   sample_start=A1_Merged['SampleDetectTime'],
                                                                                                   calDelimit_input=calDelimit,
                                                                                                   cal_window_size_input=cal_window_size,
                                                                                                   sampleDelimit_input=sampleDelimit,
                                                                                                   sample_window_size_input=sample_window_size)
A1_Merged['cal_window_start']=cal_window_start
A1_Merged['cal_window_end']=cal_window_end
A1_Merged['sample_window_start']=sample_window_start
A1_Merged['sample_window_end']=sample_window_end


cal_window_start, cal_window_end, sample_window_start, sample_window_end = calculate_window_values(bubble_start=A2_Merged['BubbleDetectTime'],
                                                                                                   sample_start=A2_Merged['SampleDetectTime'],
                                                                                                   calDelimit_input=calDelimit,
                                                                                                   cal_window_size_input=cal_window_size,
                                                                                                   sampleDelimit_input=sampleDelimit,
                                                                                                   sample_window_size_input=sample_window_size)
A2_Merged['cal_window_start']=cal_window_start
A2_Merged['cal_window_end']=cal_window_end
A2_Merged['sample_window_start']=sample_window_start
A2_Merged['sample_window_end']=sample_window_end


# sensor B

# Match window values of Sensor B for each test
calDelimit = 20
cal_window_size = 18
sampleDelimit_blood = 24
sampleDelimit_aqueous = 30
sample_window_size = 4

B1_Merged['cal_window_start'], B1_Merged['cal_window_end'], \
B1_Merged['sample_window_start'], B1_Merged['sample_window_end'] = zip(*B1_Merged.apply(
    lambda row: calculate_window_values(
        bubble_start=row['BubbleDetectTime'],
        sample_start=row['SampleDetectTime'],
        calDelimit_input=calDelimit,
        cal_window_size_input=cal_window_size,
        sampleDelimit_input=sampleDelimit_aqueous if row['FluidType'].startswith('Eurotrol') else sampleDelimit_blood,
        sample_window_size_input=sample_window_size
    ),
    axis=1
))

# For sensor B in system 2, blood and aqueous
B2_Merged['cal_window_start'], B2_Merged['cal_window_end'], \
B2_Merged['sample_window_start'], B2_Merged['sample_window_end'] = zip(*B2_Merged.apply(
    lambda row: calculate_window_values(
        bubble_start=row['BubbleDetectTime'],
        sample_start=row['SampleDetectTime'],
        calDelimit_input=calDelimit,
        cal_window_size_input=cal_window_size,
        sampleDelimit_input=sampleDelimit_aqueous if row['FluidType'].startswith('Eurotrol') else sampleDelimit_blood,
        sample_window_size_input=sample_window_size
    ),
    axis=1
))
In [10]:
# Adds TestIDs as index to the values post-window extraction 
# System 1 - Sensor A

A1_cal_window = []
A1_sample_window = []
for i in range(len(A1_Merged)):
    cal_window, sample_window = calculate_window_data(A1_Merged.iloc[i, :])
    A1_cal_window.append(cal_window.values)
    A1_sample_window.append(sample_window.values)
A1_cal_window = pd.DataFrame(A1_cal_window)
A1_sample_window = pd.DataFrame(A1_sample_window)
A1_cal_window['TestID'] = A1_sample_window['TestID'] = A1_Merged['TestID'].astype(int)
A1_sample_window.set_index('TestID',inplace=True)
A1_cal_window.set_index('TestID',inplace=True)

# System 2 - Sensor A

A2_cal_window = []
A2_sample_window = []
for i in range(len(A2_Merged)):
    cal_window, sample_window = calculate_window_data(A2_Merged.iloc[i, :])
    A2_cal_window.append(cal_window.values)
    A2_sample_window.append(sample_window.values)
A2_cal_window = pd.DataFrame(A2_cal_window)
A2_sample_window = pd.DataFrame(A2_sample_window)
A2_cal_window['TestID'] = A2_sample_window['TestID'] = A2_Merged['TestID'].astype(int)
A2_sample_window.set_index('TestID',inplace=True)
A2_cal_window.set_index('TestID',inplace=True)

# System 1 - Sensor B

B1_cal_window = []
B1_sample_window = []
for i in range(len(B1_Merged)):
    cal_window, sample_window = calculate_window_data(B1_Merged.iloc[i, :])
    B1_cal_window.append(cal_window.values)
    B1_sample_window.append(sample_window.values)
B1_cal_window = pd.DataFrame(B1_cal_window)
B1_sample_window = pd.DataFrame(B1_sample_window)
B1_cal_window['TestID'] = B1_sample_window['TestID'] = B1_Merged['TestID'].astype(int)
B1_sample_window.set_index('TestID',inplace=True)
B1_cal_window.set_index('TestID',inplace=True)

# System 2 - Sensor B

B2_cal_window = []
B2_sample_window = []
for i in range(len(B2_Merged)):
    cal_window, sample_window = calculate_window_data(B2_Merged.iloc[i, :])
    B2_cal_window.append(cal_window.values)
    B2_sample_window.append(sample_window.values)
B2_cal_window = pd.DataFrame(B2_cal_window)
B2_sample_window = pd.DataFrame(B2_sample_window)
B2_cal_window['TestID'] = B2_sample_window['TestID'] = B2_Merged['TestID'].astype(int)
B2_sample_window.set_index('TestID',inplace=True)
B2_cal_window.set_index('TestID',inplace=True)

3.1. Validating if there are partial or full missing values after the extraction¶

In [11]:
A1_cal_window_drop_index = A1_cal_window.loc[A1_cal_window.isna().sum(axis=1)!=0].index
A2_cal_window_drop_index = A2_cal_window.loc[A2_cal_window.isna().sum(axis=1)!=0].index

A1_sample_window_drop_index = A1_sample_window.loc[A1_sample_window.isna().sum(axis=1)!=0].index
A2_sample_window_drop_index = A2_sample_window.loc[A2_sample_window.isna().sum(axis=1)!=0].index

B1_cal_window_drop_index = B1_cal_window.loc[B1_cal_window.isna().sum(axis=1)!=0].index
B2_cal_window_drop_index = B2_cal_window.loc[B2_cal_window.isna().sum(axis=1)!=0].index

B1_sample_window_drop_index = B1_sample_window.loc[B1_sample_window.isna().sum(axis=1)!=0].index
B2_sample_window_drop_index = B2_sample_window.loc[B2_sample_window.isna().sum(axis=1)!=0].index

# Check if missing values in different windows is different
print("The missing value in calibration window:",A1_cal_window_drop_index)
print("The missing value in sample window:",A1_sample_window_drop_index)
print("The missing value in calibration window:",A2_cal_window_drop_index)
print("The missing value in sample window:",A2_sample_window_drop_index)

print("The missing value in calibration window:",B1_cal_window_drop_index)
print("The missing value in sample window:",B1_sample_window_drop_index)
print("The missing value in calibration window:",B2_cal_window_drop_index)
print("The missing value in sample window:",B2_sample_window_drop_index)
The missing value in calibration window: Float64Index([], dtype='float64', name='TestID')
The missing value in sample window: Float64Index([], dtype='float64', name='TestID')
The missing value in calibration window: Int64Index([], dtype='int64', name='TestID')
The missing value in sample window: Int64Index([], dtype='int64', name='TestID')
The missing value in calibration window: Float64Index([], dtype='float64', name='TestID')
The missing value in sample window: Float64Index([], dtype='float64', name='TestID')
The missing value in calibration window: Float64Index([], dtype='float64', name='TestID')
The missing value in sample window: Float64Index([], dtype='float64', name='TestID')

3.2. Validating data shape post-window extraction¶

In [12]:
# Set index for Merge datasets
A1_Merged.set_index("TestID", inplace=True)
A2_Merged.set_index("TestID", inplace=True)
B1_Merged.set_index("TestID", inplace=True)
B2_Merged.set_index("TestID", inplace=True)

# Find missing value
print("The problem indexes after extract the window are:",A1_Merged.index.difference(A1_cal_window.index))
print("The problem indexes after extract the window are:",A1_Merged.index.difference(A1_sample_window.index))
print("The problem indexes after extract the window are:",A2_Merged.index.difference(A2_cal_window.index))
print("The problem indexes after extract the window are:",A2_Merged.index.difference(A2_sample_window.index))

print("The problem indexes after extract the window are:",B1_Merged.index.difference(B1_cal_window.index))
print("The problem indexes after extract the window are:",B1_Merged.index.difference(B1_sample_window.index))
print("The problem indexes after extract the window are:",B2_Merged.index.difference(B2_cal_window.index))
print("The problem indexes after extract the window are:",B2_Merged.index.difference(B2_sample_window.index))

A1_Merged = A1_Merged.drop(A1_Merged.index.difference(A1_cal_window.index))
A1_Merged = A1_Merged.drop(A1_Merged.index.difference(A1_sample_window.index))
A2_Merged = A2_Merged.drop(A2_Merged.index.difference(A2_cal_window.index))
A2_Merged = A2_Merged.drop(A2_Merged.index.difference(A2_sample_window.index))

B1_Merged = B1_Merged.drop(B1_Merged.index.difference(B1_cal_window.index))
B1_Merged = B1_Merged.drop(B1_Merged.index.difference(B1_sample_window.index))
B2_Merged = B2_Merged.drop(B2_Merged.index.difference(B2_cal_window.index))
B2_Merged = B2_Merged.drop(B2_Merged.index.difference(B2_sample_window.index))

# Clear the Nan in index of sensor A
A1_cal_window = A1_cal_window[~A1_cal_window.index.isna()]
A1_sample_window = A1_sample_window[~A1_sample_window.index.isna()]
A2_cal_window = A2_cal_window[~A2_cal_window.index.isna()]
A2_sample_window = A2_sample_window[~A2_sample_window.index.isna()]

# Clear the Nan in index of sensor B
B1_cal_window = B1_cal_window[~B1_cal_window.index.isna()]
B1_sample_window = B1_sample_window[~B1_sample_window.index.isna()]
B2_cal_window = B2_cal_window[~B2_cal_window.index.isna()]
B2_sample_window = B2_sample_window[~B2_sample_window.index.isna()]
The problem indexes after extract the window are: Int64Index([12470355, 12470361, 12470365, 12537663, 12539049, 12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([12470355, 12470361, 12470365, 12537663, 12539049, 12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([3518677, 3518678], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([3518677, 3518678], dtype='int64', name='TestID')
In [13]:
# Shape of the subsets of time series after the extraction from the windows

# Cal Window
print('Shape of the time series after extraction')
print('A1_cal_window: ', A1_cal_window.shape)
print('A2_cal_window: ', A2_cal_window.shape)
print('B1_cal_window: ', B1_cal_window.shape)
print('B2_cal_window: ', B2_cal_window.shape)

# Sample Window
print('A1_sample_window: ', A1_sample_window.shape)
print('A2_sample_window: ', A2_sample_window.shape)
print('B1_sample_window: ', B1_sample_window.shape)
print('B2_sample_window: ', B2_sample_window.shape)

# We can delete the unmatch index but it is not necessary
Shape of the time series after extraction
A1_cal_window:  (3368, 41)
A2_cal_window:  (7743, 41)
B1_cal_window:  (3373, 91)
B2_cal_window:  (7741, 91)
A1_sample_window:  (3368, 26)
A2_sample_window:  (7743, 26)
B1_sample_window:  (3373, 21)
B2_sample_window:  (7741, 21)

3.3. Merging scaled data with additional attributes of interest¶

In [14]:
# Combine data: Merge the time series with "FluidType", "AgeOfCardInDaysAtTimeOfTest", "Fluid_Temperature_Filled", "FluidTypeBin", "CardAgeBin", "FluidTempBin"
A1_cal_window_combine = Merge_data(A1_cal_window,A1_Merged)
A2_cal_window_combine = Merge_data(A2_cal_window,A2_Merged)

B1_cal_window_combine = Merge_data(B1_cal_window,B1_Merged)
B2_cal_window_combine = Merge_data(B2_cal_window,B2_Merged)

## Sample window
A1_sample_window_combine = Merge_data(A1_sample_window,A1_Merged)
A2_sample_window_combine = Merge_data(A2_sample_window,A2_Merged)

B1_sample_window_combine = Merge_data(B1_sample_window,B1_Merged)
B2_sample_window_combine = Merge_data(B2_sample_window,B2_Merged)

3.4. Balancing the specific attributes¶

In [15]:
System1_Index, System2_Index =  balance_index(A1_cal_window_combine,A2_cal_window_combine,"CardAgeBin")
System1 Sensor A & B distribution:
 [0-9]        142
(9-28]       142
(28-56]      142
(56-84]      142
(84-112]     142
(112-140]    142
(140-168]    142
(168-196]    142
(196-224]    142
(224-252]    142
Name: CardAgeBin, dtype: int64

 System2 Sensor A & B distribution:
 [0-9]        142
(9-28]       142
(28-56]      142
(56-84]      142
(84-112]     142
(112-140]    142
(140-168]    142
(168-196]    142
(196-224]    142
(224-252]    142
Name: CardAgeBin, dtype: int64
In [16]:
# Balanced data
A1_cal_window_combine_balanced = A1_cal_window_combine.loc[System1_Index]
A1_sample_window_combine_balanced = A1_sample_window_combine.loc[System1_Index]
A2_cal_window_combine_balanced = A2_cal_window_combine.loc[System2_Index]
A2_sample_window_combine_balanced = A2_sample_window_combine.loc[System2_Index]

B1_cal_window_combine_balanced = B1_cal_window_combine.loc[System1_Index]
B1_sample_window_combine_balanced = B1_sample_window_combine.loc[System1_Index]
B2_cal_window_combine_balanced = B2_cal_window_combine.loc[System2_Index]
B2_sample_window_combine_balanced = B2_sample_window_combine.loc[System2_Index]

3.5. Windows visualization¶

Fluid Temperature¶

System 1 and System 2: Sensor A - Cal and Sample Windows¶

In [17]:
# Plot all the balanced time series from the window extraction
plot_all_time_series_in_group(A1_cal_window_combine_balanced, A1_sample_window_combine_balanced, A2_cal_window_combine_balanced, A2_sample_window_combine_balanced, "CardAgeBin", "A1_cal_window_combine", "A1_sample_window_combine","A2_blood_cal_window_combine", "A2_sample_window_combine")

System 1 and System 2: Sensor B - Cal and Sample Windows¶

In [18]:
# Plot all the balanced time series from the window extraction
plot_all_time_series_in_group(B1_cal_window_combine_balanced, B1_sample_window_combine_balanced, B2_cal_window_combine_balanced, B2_sample_window_combine_balanced, "CardAgeBin", "B1_cal_window_combine", "B1_sample_window_combine", "B2_blood_cal_window_combine", "B2_sample_window_combine")

4. FPCA characterization¶

4.1. Functional PC1 plots (both systems) and Characterization of FPC Scores¶

The following secssion will introduce

  1. Explan variance
  2. Waveforms with significant features of different components
  3. Plot 1-2: all the waveforms and one mean waveform after aggregating
  4. Plot 3-4: The first two component in different systems
  5. Plot 5: The first component of two systems in the same canvas
  6. Plot 6: The confidential interval of two systems by bootstrap
  7. Plot 7-8: The boxplots in two systems show the different percentile about the first component
    • Red dashed lines indicate detected outliers
    • Red area shows the box region
  8. Plot 9-12: The visualization of PCA component scores

System 1 versus System 2: Sensor A - Cal Window¶

In [19]:
pc_scores_s1_A_cal_window, pc_scores_s2_A_cal_window,fpca_s1_A_cal_window,fpca_s2_A_cal_window = fpca_two_inputs(A1_cal_window_combine_balanced.iloc[:,:-6], A2_cal_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
ac1, ac2 = bootstrap(A1_cal_window_combine_balanced, A2_cal_window_combine_balanced,"A","cal_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_A_cal_window, pc_scores_s2_A_cal_window, A1_cal_window_combine_balanced, A2_cal_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.9990678032829
S1 Explain variance PC2 (%):  0.0009287030757880102
S2 Explain variance PC1 (%):  99.99921606779829
S2 Explain variance PC2 (%):  0.0007826074950738559
The time series contributing most to PC1 is at index 592 with TestID 12557583.0
The time series contributing most to PC2 is at index 800 with TestID 12529762.0
The time series contributing most to PC1 is at index 1274 with TestID 3572012
The time series contributing most to PC2 is at index 91 with TestID 3568638
/Users/nayemontiel18/Library/CloudStorage/OneDrive-UBC/UBCO/MDS/CAPSTONE_PROJECT/Data/Python/Final/functionalPCA.py:180: UserWarning: Attempting to set identical low and high ylims makes transformation singular; automatically expanding.
  plt.ylim(global_y_FPC1_min, global_y_FPC1_max)
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[19]:

System 1 versus System 2: Sensor A - Sample Window¶

In [20]:
pc_scores_s1_A_sample_window, pc_scores_s2_A_sample_window,fpca_s1_A_sample_window,fpca_s2_A_sample_window = fpca_two_inputs(A1_sample_window_combine_balanced.iloc[:,:-6], A2_sample_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
as1,as2 = bootstrap(A1_sample_window_combine_balanced, A2_sample_window_combine_balanced,"A","sample_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_A_sample_window, pc_scores_s2_A_sample_window, A1_sample_window_combine_balanced, A2_sample_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.99971295089486
S1 Explain variance PC2 (%):  0.0002834456263569399
S2 Explain variance PC1 (%):  99.99971883909066
S2 Explain variance PC2 (%):  0.00027994513655736327
The time series contributing most to PC1 is at index 948 with TestID 12573896.0
The time series contributing most to PC2 is at index 800 with TestID 12529762.0
The time series contributing most to PC1 is at index 1152 with TestID 3572286
The time series contributing most to PC2 is at index 140 with TestID 3568703
/Users/nayemontiel18/Library/CloudStorage/OneDrive-UBC/UBCO/MDS/CAPSTONE_PROJECT/Data/Python/Final/functionalPCA.py:180: UserWarning: Attempting to set identical low and high ylims makes transformation singular; automatically expanding.
  plt.ylim(global_y_FPC1_min, global_y_FPC1_max)
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[20]:

System 1 versus System 2: Sensor B - Cal Window¶

In [21]:
pc_scores_s1_B_cal_window, pc_scores_s2_B_cal_window,fpca_s1_B_cal_window,fpca_s2_B_cal_window = fpca_two_inputs(B1_cal_window_combine_balanced.iloc[:,:-6], B2_cal_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
bc1,bc2 = bootstrap(B1_cal_window_combine_balanced, B2_cal_window_combine_balanced,"B","cal_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_B_cal_window, pc_scores_s2_B_cal_window, B1_cal_window_combine_balanced, B2_cal_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.9924050676022
S1 Explain variance PC2 (%):  0.00756531585631101
S2 Explain variance PC1 (%):  99.98991709411045
S2 Explain variance PC2 (%):  0.010053295941146666
The time series contributing most to PC1 is at index 133 with TestID 12544066.0
The time series contributing most to PC2 is at index 82 with TestID 12615989.0
The time series contributing most to PC1 is at index 425 with TestID 3556323.0
The time series contributing most to PC2 is at index 53 with TestID 3565690.0
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[21]:

System 1 versus System 2: Sensor B - Sample Window¶

In [22]:
pc_scores_s1_B_sample_window, pc_scores_s2_B_sample_window,fpca_s1_B_sample_window,fpca_s2_B_sample_window = fpca_two_inputs(B1_sample_window_combine_balanced.iloc[:,:-6], B2_sample_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
bs1,bs2 = bootstrap(B1_sample_window_combine_balanced, B2_sample_window_combine_balanced, "B","sample_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_B_sample_window, pc_scores_s2_B_sample_window, B1_sample_window_combine_balanced, B2_sample_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.9983476221513
S1 Explain variance PC2 (%):  0.0016416333798515348
S2 Explain variance PC1 (%):  99.99862924339853
S2 Explain variance PC2 (%):  0.0013666399895473865
The time series contributing most to PC1 is at index 78 with TestID 12546583.0
The time series contributing most to PC2 is at index 684 with TestID 12191141.0
The time series contributing most to PC1 is at index 105 with TestID 3560142.0
The time series contributing most to PC2 is at index 666 with TestID 3518710.0
/Users/nayemontiel18/Library/CloudStorage/OneDrive-UBC/UBCO/MDS/CAPSTONE_PROJECT/Data/Python/Final/functionalPCA.py:180: UserWarning: Attempting to set identical low and high ylims makes transformation singular; automatically expanding.
  plt.ylim(global_y_FPC1_min, global_y_FPC1_max)
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[22]:

4.2 Linear Regression for slope¶

R-square and visualization¶

In [23]:
df_list = []

def append_to_dataframe(window_name, slope1, slope2):
    global df_list
    df_list.append({'Window': window_name, 'Slope 1': slope1, 'Slope 2': slope2})
append_to_dataframe('A_cal_window', *visualize_regression(fpca_s1_A_cal_window, fpca_s2_A_cal_window))
append_to_dataframe('A_sample_window', *visualize_regression(fpca_s1_A_sample_window, fpca_s2_A_sample_window))
append_to_dataframe('B_cal_window', *visualize_regression(fpca_s1_B_cal_window, fpca_s2_B_cal_window))
append_to_dataframe('B_sample_window', *visualize_regression(fpca_s1_B_sample_window, fpca_s2_B_sample_window))
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.928
Model:                            OLS   Adj. R-squared:                  0.926
Method:                 Least Squares   F-statistic:                     499.0
Date:                Wed, 12 Jun 2024   Prob (F-statistic):           7.84e-24
Time:                        21:43:42   Log-Likelihood:                 480.75
No. Observations:                  41   AIC:                            -957.5
Df Residuals:                      39   BIC:                            -954.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.1581   6.15e-07  -2.57e+05      0.000      -0.158      -0.158
x1          5.913e-07   2.65e-08     22.339      0.000    5.38e-07    6.45e-07
==============================================================================
Omnibus:                        4.269   Durbin-Watson:                   0.346
Prob(Omnibus):                  0.118   Jarque-Bera (JB):                2.726
Skew:                          -0.440   Prob(JB):                        0.256
Kurtosis:                       2.094   Cond. No.                         45.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 6.521e+04
Date:                Wed, 12 Jun 2024   Prob (F-statistic):           1.76e-64
Time:                        21:43:42   Log-Likelihood:                 502.66
No. Observations:                  41   AIC:                            -1001.
Df Residuals:                      39   BIC:                            -997.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.1582    3.6e-07  -4.39e+05      0.000      -0.158      -0.158
x1          3.961e-06   1.55e-08    255.364      0.000    3.93e-06    3.99e-06
==============================================================================
Omnibus:                        2.808   Durbin-Watson:                   0.279
Prob(Omnibus):                  0.246   Jarque-Bera (JB):                2.625
Skew:                          -0.582   Prob(JB):                        0.269
Kurtosis:                       2.572   Cond. No.                         45.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.990
Model:                            OLS   Adj. R-squared:                  0.989
Method:                 Least Squares   F-statistic:                     2328.
Date:                Wed, 12 Jun 2024   Prob (F-statistic):           2.07e-25
Time:                        21:43:42   Log-Likelihood:                 288.65
No. Observations:                  26   AIC:                            -573.3
Df Residuals:                      24   BIC:                            -570.8
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.1999   1.45e-06  -1.38e+05      0.000      -0.200      -0.200
x1         -4.792e-06   9.93e-08    -48.247      0.000      -5e-06   -4.59e-06
==============================================================================
Omnibus:                        1.782   Durbin-Watson:                   0.222
Prob(Omnibus):                  0.410   Jarque-Bera (JB):                1.587
Skew:                           0.510   Prob(JB):                        0.452
Kurtosis:                       2.349   Cond. No.                         28.4
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 2.191e+04
Date:                Wed, 12 Jun 2024   Prob (F-statistic):           4.75e-37
Time:                        21:43:42   Log-Likelihood:                 321.26
No. Observations:                  26   AIC:                            -638.5
Df Residuals:                      24   BIC:                            -636.0
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.2001   4.13e-07  -4.84e+05      0.000      -0.200      -0.200
x1          4.195e-06   2.83e-08    148.025      0.000    4.14e-06    4.25e-06
==============================================================================
Omnibus:                        2.679   Durbin-Watson:                   0.414
Prob(Omnibus):                  0.262   Jarque-Bera (JB):                2.339
Skew:                           0.681   Prob(JB):                        0.311
Kurtosis:                       2.451   Cond. No.                         28.4
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 7.382e+05
Date:                Wed, 12 Jun 2024   Prob (F-statistic):          3.45e-176
Time:                        21:43:43   Log-Likelihood:                 1028.7
No. Observations:                  91   AIC:                            -2053.
Df Residuals:                      89   BIC:                            -2048.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.1059   6.27e-07  -1.69e+05      0.000      -0.106      -0.106
x1          1.034e-05    1.2e-08    859.185      0.000    1.03e-05    1.04e-05
==============================================================================
Omnibus:                        3.363   Durbin-Watson:                   0.223
Prob(Omnibus):                  0.186   Jarque-Bera (JB):                3.357
Skew:                          -0.447   Prob(JB):                        0.187
Kurtosis:                       2.704   Cond. No.                         103.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 3.213e+06
Date:                Wed, 12 Jun 2024   Prob (F-statistic):          1.31e-204
Time:                        21:43:43   Log-Likelihood:                 1085.1
No. Observations:                  91   AIC:                            -2166.
Df Residuals:                      89   BIC:                            -2161.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.1059   3.37e-07   3.14e+05      0.000       0.106       0.106
x1         -1.159e-05   6.47e-09  -1792.424      0.000   -1.16e-05   -1.16e-05
==============================================================================
Omnibus:                        3.505   Durbin-Watson:                   0.304
Prob(Omnibus):                  0.173   Jarque-Bera (JB):                3.133
Skew:                           0.368   Prob(JB):                        0.209
Kurtosis:                       2.466   Cond. No.                         103.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.996
Model:                            OLS   Adj. R-squared:                  0.996
Method:                 Least Squares   F-statistic:                     4620.
Date:                Wed, 12 Jun 2024   Prob (F-statistic):           3.76e-24
Time:                        21:43:43   Log-Likelihood:                 198.85
No. Observations:                  21   AIC:                            -393.7
Df Residuals:                      19   BIC:                            -391.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.2231   8.28e-06   -2.7e+04      0.000      -0.223      -0.223
x1         -4.812e-05   7.08e-07    -67.971      0.000   -4.96e-05   -4.66e-05
==============================================================================
Omnibus:                        2.522   Durbin-Watson:                   0.137
Prob(Omnibus):                  0.283   Jarque-Bera (JB):                1.955
Skew:                           0.608   Prob(JB):                        0.376
Kurtosis:                       2.130   Cond. No.                         22.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.993
Model:                            OLS   Adj. R-squared:                  0.993
Method:                 Least Squares   F-statistic:                     2668.
Date:                Wed, 12 Jun 2024   Prob (F-statistic):           6.74e-22
Time:                        21:43:43   Log-Likelihood:                 201.50
No. Observations:                  21   AIC:                            -399.0
Df Residuals:                      19   BIC:                            -396.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.2233   7.29e-06  -3.06e+04      0.000      -0.223      -0.223
x1         -3.222e-05   6.24e-07    -51.650      0.000   -3.35e-05   -3.09e-05
==============================================================================
Omnibus:                        2.436   Durbin-Watson:                   0.128
Prob(Omnibus):                  0.296   Jarque-Bera (JB):                1.970
Skew:                           0.634   Prob(JB):                        0.374
Kurtosis:                       2.197   Cond. No.                         22.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Slopes Results Comparison for one sample¶

In [24]:
slopes_df = pd.DataFrame(df_list)
slopes_df
Out[24]:
Window Slope 1 Slope 2
0 A_cal_window 5.913307e-07 0.000004
1 A_sample_window -4.792280e-06 0.000004
2 B_cal_window 1.033637e-05 -0.000012
3 B_sample_window -4.811958e-05 -0.000032

5. Functional Regression¶

This is another functional analysis method. Unlike FPCA, the following analysis utilizes the entire time series in a balanced and centered dataset as response variables for regression with the features before grouping by bins. This is done to distinguish between two systems under the influence of features.

5.1. Regression coefficients¶

This is the coeffcient from the output of the model. Because of the different magnitude, we need to choose the time stamps before we visualize

Sensor A¶

Cal window¶

In [25]:
print("System 1:")
A1_cal_window_funct_reg = Function_regression(A1_cal_window_combine_balanced,40,['AgeOfCardInDaysAtTimeOfTest'])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
A2_cal_window_funct_reg = Function_regression(A2_cal_window_combine_balanced,40,['AgeOfCardInDaysAtTimeOfTest'])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 40.0),), n_basis=41, period=40.0),
    coefficients=[[ 1.12969934e+02 -2.54087869e-01 -6.16478393e-02  4.26477462e-02
      -1.55955210e-01 -2.04932193e-01  6.98070470e-02 -1.07249592e-02
      -1.97358451e-01  2.62929686e-01  6.94883436e-02 -2.02584680e-01
       1.98420073e-01 -7.16210555e-02  1.68070284e-01 -4.90006365e-02
       2.98174620e-01 -9.36189532e-03  2.26980579e-01 -4.54559332e-01
       9.45223291e-02  1.08290355e-01  8.54976365e-02 -1.09581495e-01
       1.06591476e-02 -9.73530024e-03  2.03426158e-03 -1.90255820e-01
       3.99508688e-02 -4.78771163e-01  1.73005869e-01  4.45503135e-03
      -3.30847265e-01  1.85403260e-01  3.05801586e-02 -2.28451401e-01
       5.71844662e-02 -2.75866281e-01  7.50141766e-02 -1.85441728e+14
      -4.65345409e-01]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 40.0),), n_basis=41, period=40.0),
    coefficients=[[-2.63794588e+00 -5.15516832e-03  1.41558180e-03 -4.31592011e-03
      -1.67517791e-03 -1.26543761e-03 -1.82353381e-04 -1.71764618e-03
      -3.36598606e-03  6.41940225e-03  3.75557177e-04  2.18777832e-04
       3.53878654e-03 -1.47400343e-03 -3.44271340e-04 -1.24052737e-03
       3.51891726e-03  1.31600689e-03  6.31958054e-04 -1.18780137e-02
       7.94026118e-04  8.87788575e-04 -3.12602046e-03 -5.49175397e-04
      -7.52476552e-04 -2.30663016e-03 -1.10132979e-03  2.25616407e-04
       5.52379376e-03 -7.13771225e-03 -1.83260609e-03 -2.43191772e-03
      -4.99579248e-03 -4.21760416e-03 -5.85762138e-04 -1.99369278e-03
      -2.38713482e-03 -5.10136445e-03 -1.45882602e-03 -3.89211385e+12
      -1.09561122e-02]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 40.0),), n_basis=41, period=40.0),
    coefficients=[[ 1.61334295e+02 -8.02754388e-02  1.72225788e-01 -2.69479690e-01
      -8.14106916e-02 -1.37047254e-01  1.00099835e-01  5.61845541e-03
      -1.68777631e-01  3.97094202e-01  1.77476466e-01 -4.94631428e-02
       3.08787178e-01 -1.92243756e-01  1.64548803e-01  2.53289590e-01
       2.21945871e-01  1.54356569e-01  1.97794098e-01 -6.66500544e-01
       4.41026633e-02 -4.29601821e-02  2.09660518e-02 -1.34349701e-01
      -3.49200465e-02  1.31442408e-01 -1.13435424e-01 -5.82585730e-02
       2.74313379e-01 -1.63342670e-01 -7.70645794e-02 -1.46698164e-01
      -1.16283865e-01 -4.21019224e-02  1.35507042e-01 -2.93771857e-01
      -7.00428325e-02 -2.31147371e-01 -3.54957394e-01 -1.68051439e+14
      -4.25991048e-01]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 40.0),), n_basis=41, period=40.0),
    coefficients=[[-2.58762788e+00 -7.56787084e-03 -3.08935264e-05 -2.72882602e-03
      -2.22761942e-03 -2.10797871e-03 -1.92915150e-04 -2.15232851e-03
      -4.61978546e-03  6.74288771e-03 -3.22716449e-04 -1.17488856e-03
       3.54534466e-03 -1.17028520e-03 -2.06940198e-04 -3.45497680e-03
       4.48881079e-03 -2.71422688e-06  1.13400451e-03 -1.25112921e-02
       1.50221791e-03  2.22491660e-03 -3.37105031e-03 -7.57610757e-04
      -3.66440707e-04 -3.80545943e-03 -4.51773878e-04 -8.71571238e-04
       4.83451634e-03 -1.10976566e-02  3.24722776e-05 -1.74911907e-03
      -7.54524651e-03 -3.25381453e-03 -1.43324553e-03 -2.49325682e-03
      -1.83695477e-03 -6.57235240e-03  1.35713961e-03 -4.89340787e+12
      -1.35647941e-02]]) 

Sample window¶

In [26]:
print("System 1:")
A1_sample_window_funct_reg = Function_regression(A1_sample_window_combine_balanced,25,["AgeOfCardInDaysAtTimeOfTest"])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
A2_sample_window_funct_reg = Function_regression(A2_sample_window_combine_balanced,25,["AgeOfCardInDaysAtTimeOfTest"])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 25.0),), n_basis=25, period=25.0),
    coefficients=[[8.61205375e+01 1.27088143e-01 1.91328493e-02 6.35335880e-02
      5.06505700e-03 4.12161405e-02 2.54016366e-03 3.00481128e-02
      1.01510836e-03 2.29581338e-02 9.43264471e-04 1.70733857e-02
      7.30546035e-04 1.30920595e-02 4.25301550e-04 1.06563585e-02
      4.68470296e-04 7.46018237e-03 5.95669041e-04 5.43748127e-03
      8.96060131e-05 3.22866589e-03 4.07170310e-04 1.44224073e-03
      2.19040425e-04]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 25.0),), n_basis=25, period=25.0),
    coefficients=[[-1.96813078e+00 -1.81712557e-03  6.61561593e-05 -8.89158533e-04
       1.63223556e-05 -5.72415295e-04  7.71483667e-06 -4.17648193e-04
       6.88451146e-06 -3.16525794e-04  7.13968548e-08 -2.38887920e-04
       3.88757879e-06 -1.83242005e-04 -3.46688201e-07 -1.51261251e-04
       4.23082763e-06 -1.07984486e-04  2.34002348e-06 -7.25199767e-05
      -4.92191161e-07 -4.70796773e-05  1.44940743e-06 -1.54952369e-05
       1.33581066e-06]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 25.0),), n_basis=25, period=25.0),
    coefficients=[[ 1.23126417e+02 -8.61418411e-02 -1.89468773e-02 -4.25824381e-02
      -4.60557400e-03 -2.76141653e-02 -2.67756783e-03 -2.03621323e-02
      -8.36259672e-04 -1.51373008e-02 -4.28208855e-04 -1.18295509e-02
      -1.14644130e-03 -8.87857109e-03 -3.11954713e-04 -7.21049439e-03
      -3.18110776e-04 -5.43040279e-03 -3.07131281e-04 -3.57209200e-03
      -9.24267796e-05 -2.48056330e-03 -1.02028602e-04 -8.81542934e-04
      -2.59116284e-04]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 25.0),), n_basis=25, period=25.0),
    coefficients=[[-1.91388167e+00 -2.78684893e-03  3.37983086e-06 -1.36922091e-03
       2.22979523e-06 -8.88522860e-04  5.88645059e-06 -6.38543032e-04
      -6.28762595e-07 -4.85992115e-04 -1.06337687e-06 -3.73584550e-04
       3.75269117e-06 -2.93053687e-04 -1.99890583e-06 -2.21387871e-04
      -8.12409589e-07 -1.65172120e-04 -2.13276091e-06 -1.14396246e-04
      -1.03648358e-06 -6.38227956e-05 -2.42994622e-07 -2.05659833e-05
       2.07265203e-06]]) 

Sensor B¶

Cal window¶

In [27]:
print("System 1:")
B1_cal_window_funct_reg = Function_regression(B1_cal_window_combine_balanced,90,["AgeOfCardInDaysAtTimeOfTest"])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
B2_cal_window_funct_reg = Function_regression(B2_cal_window_combine_balanced,90,["AgeOfCardInDaysAtTimeOfTest"])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 90.0),), n_basis=91, period=90.0),
    coefficients=[[-2.65583522e+03 -3.30129041e+00 -9.30975751e+00  2.83933817e+00
      -6.07673825e+00  7.82963792e+00 -2.86916728e+00  3.53985118e+00
       1.18964728e+00 -3.38110939e+00  4.12124493e+00 -3.35693442e+00
      -8.26391414e+00  1.22183372e+00  3.99370546e+00 -3.64919518e+00
      -4.80260271e+00 -1.15372824e+00 -2.30116974e+00  3.22014780e+00
      -1.46994983e+01  1.66019841e+01  1.65207352e+01 -1.31892215e+01
      -1.10048244e+01  2.64561780e+00 -8.18122621e+00  9.62968584e+00
      -4.40665826e+00  9.68428087e+00 -4.35548230e+00  1.46960509e+01
       5.99791635e+00 -2.60883738e+00  8.27400875e+00  3.38760283e+00
       1.77457892e+00  4.08949657e+00  4.78311751e+00  2.02944618e+00
       4.07984148e+00  4.31230003e+00  2.59722228e-01 -7.90494435e+00
       2.91802719e+00 -7.01100420e+00 -2.78555547e+00  1.10457704e+01
       7.16237017e-01  5.55046550e+00 -4.85947499e+00  8.44217564e+00
       3.71262914e+00  8.23434138e+00 -5.44863379e-01  1.37460693e+01
       1.43268139e+01 -3.61790851e+00  1.43356239e+01 -2.88599518e+00
       1.01133348e+01 -9.27232994e+00  1.15873765e+01 -4.66189277e+00
       1.01565354e+01 -1.37784970e+01  7.22421391e-01 -1.05595476e+01
       7.54299853e-01 -4.72556207e+00  5.21196299e+00 -7.67618593e+00
      -3.96953268e+00 -8.82583750e+00 -4.24104492e-01 -8.62666925e+00
       1.04774391e+00 -5.31532520e+00 -9.76943755e+00  4.19329414e+00
       2.64556374e+00 -1.11608374e+01  7.15846389e-01 -6.42714208e+00
       7.13414484e-01 -1.38229036e+01 -4.97710215e+00 -2.09547778e+01
       7.22091356e-01 -5.70982481e+15 -3.45076589e+01]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 90.0),), n_basis=91, period=90.0),
    coefficients=[[-3.31651755e-01 -2.94792336e-03  3.14315927e-03  1.62255145e-03
       4.00858154e-03 -1.46635782e-02  2.37878949e-03  1.18818738e-02
       1.79049083e-04 -2.02367135e-03  5.65992769e-03  2.35617973e-03
       6.88204908e-04  7.05487576e-03  3.39855309e-03  5.79808060e-03
      -5.52904516e-03 -4.84233616e-03 -1.21104718e-03 -4.74649217e-03
       2.46276065e-04 -1.00455288e-02  3.09314123e-03 -2.57243367e-03
       2.30641458e-03  4.06394024e-03 -6.91609936e-03 -3.71522473e-04
      -1.22110936e-02  6.63799376e-04  4.00035835e-03 -7.06859556e-04
      -3.11538868e-03  2.99058131e-03  7.05883003e-03  1.32255870e-03
       2.72368613e-03 -5.14154634e-03  4.29166807e-03 -4.77162746e-03
       6.61315019e-03  1.37311164e-03  2.43634741e-03 -6.36826107e-03
      -3.90252021e-03 -8.40565807e-03 -3.42672286e-03  4.87598004e-03
      -2.94074454e-03 -4.92381868e-03  2.99569106e-03  1.51647617e-03
       7.73966547e-03 -7.88366079e-03 -2.42224671e-04  8.30337449e-04
       3.91696717e-03 -2.49712154e-03 -9.21312378e-04 -6.65907289e-03
       2.32340723e-03  3.73924388e-03  8.44186072e-04 -2.56339087e-03
       5.02932912e-03  8.22207176e-03 -4.07845725e-03 -1.63638648e-03
       2.82158791e-03  1.98338476e-03 -3.79783441e-03 -8.32099776e-04
      -2.23700383e-05  3.44410157e-03  9.43511484e-04 -1.98600599e-03
      -7.75107556e-03 -9.55118975e-03 -1.50231569e-04 -5.55701060e-03
      -4.71877800e-03  1.51955942e-03 -4.21506964e-03 -7.75361478e-03
      -2.42049600e-03  3.75292431e-03  1.12166094e-02  3.09235396e-03
      -1.16694835e-02 -1.06077352e+12 -5.51165441e-03]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 90.0),), n_basis=91, period=90.0),
    coefficients=[[-2.77447779e+03 -5.02834393e+00 -1.05521277e+01  4.15985980e+00
      -6.83426836e+00  6.84875447e+00 -2.95840391e+00  7.38202847e+00
       1.18766610e+00 -4.42496374e+00  5.31783797e+00 -3.74060335e+00
      -9.15349743e+00  1.45741717e+00  4.86417857e+00 -3.22104163e+00
      -5.15992372e+00 -1.67482033e+00 -2.60031137e+00  2.95210822e+00
      -1.74184892e+01  1.83409498e+01  1.97490138e+01 -1.59620398e+01
      -1.30973132e+01  4.60798778e+00 -1.01451050e+01  1.23330899e+01
      -6.55800206e+00  1.05401573e+01 -4.83311857e+00  1.85483001e+01
       4.75546947e+00 -4.13653806e+00  9.92062410e+00  3.78637117e+00
       2.01668570e+00  1.59722595e+00  6.06608146e+00  1.36474566e+00
       6.93515465e+00  3.88832156e+00  4.83736544e-01 -9.16629114e+00
       9.05023801e-01 -1.02694715e+01 -4.50427775e+00  1.39188167e+01
      -5.00934022e-01  5.29260978e+00 -6.06160831e+00  1.08996240e+01
       5.82533411e+00  1.03377298e+01 -8.83652283e-01  1.67121917e+01
       1.71238158e+01 -4.02919547e+00  1.55311159e+01 -4.26489123e+00
       1.24687616e+01 -1.21923093e+01  1.35804498e+01 -5.69416076e+00
       1.20296013e+01 -1.62999177e+01  2.18392714e-01 -1.30974233e+01
       2.07722194e+00 -6.46294720e+00  6.48960773e+00 -9.23191343e+00
      -4.27964698e+00 -9.76476660e+00 -7.72989135e-01 -1.06100979e+01
       6.24692676e-01 -9.26151200e+00 -1.03481155e+01  4.34771408e+00
       1.86395407e+00 -1.31161009e+01 -2.57657448e-01 -7.59874124e+00
       2.78925468e-01 -1.59656061e+01 -5.12849728e+00 -2.55079701e+01
       3.69119088e-01 -6.87651382e+15 -4.12674721e+01]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 90.0),), n_basis=91, period=90.0),
    coefficients=[[-5.97100268e-01  4.08167755e-03  1.43442816e-03 -5.57936011e-03
       2.75620962e-03 -6.19894612e-04 -1.02099795e-03 -8.53481571e-03
       2.52391873e-03 -5.37445416e-04  2.54557169e-03  6.29916662e-04
      -4.21763153e-03  9.31233315e-03  2.73440933e-03 -1.92482916e-03
      -1.30413431e-02 -1.16759509e-04 -2.82474935e-03 -1.06220883e-03
       2.61543510e-03 -4.32127393e-03  1.92787663e-03 -1.17948914e-03
       1.44175692e-03 -3.80426192e-03 -9.72495073e-04 -7.20393029e-03
      -4.60507691e-03  7.51850118e-03  3.61530728e-03 -8.90685555e-03
       1.34985357e-02  1.06405374e-02  5.19499714e-03  1.52584186e-03
       2.58970454e-03  1.73146842e-02  1.17525671e-03  2.29587584e-03
      -9.34975537e-03  9.23316733e-03  1.99922557e-03 -9.31482590e-03
       1.04309378e-02  2.89046770e-03  4.57391178e-03 -1.86601579e-03
       7.99671800e-03  1.61748177e-03  5.27821834e-03 -4.87117989e-03
      -1.90605900e-03 -1.43856656e-02  1.50131873e-03 -4.49442777e-03
       5.51062304e-04 -6.40842251e-03  8.36989080e-03 -6.16938456e-04
      -4.62718442e-03  1.49837694e-02  1.40834252e-03  9.47650863e-05
       3.87405600e-03  1.10466351e-02  6.43477731e-04  2.08946703e-03
      -4.40427736e-03  9.94997935e-03 -7.86681406e-03  4.55531037e-04
      -2.91444401e-03 -7.27365402e-03  2.74931728e-03  1.92488663e-03
      -3.34754158e-03  7.33407294e-03 -8.94830732e-03 -6.14481402e-04
       6.05501550e-03 -4.62395141e-04  1.16326093e-03 -6.73567602e-03
       4.09372875e-05  3.13750407e-03  1.13814014e-02  1.24985031e-02
      -1.03222178e-02  3.93716731e+10 -4.28949816e-04]]) 

Sample window¶

In [28]:
print("System 1:")
B1_sample_window_funct_reg = Function_regression(B1_sample_window_combine_balanced,20,["AgeOfCardInDaysAtTimeOfTest"])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
B2_sample_window_funct_reg = Function_regression(B2_sample_window_combine_balanced,20,["AgeOfCardInDaysAtTimeOfTest"])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 20.0),), n_basis=21, period=20.0),
    coefficients=[[-1.19992659e+03 -6.14445230e-01  1.77863892e-02 -3.36699448e-01
       6.05370916e-01 -3.90224307e-01  2.16845388e-01 -5.04402509e-01
       1.42760995e+00 -2.19838625e+00  1.81664241e-01 -9.34748467e-02
      -1.16362905e+00 -3.41843459e-01  2.06659549e-01 -3.10065648e-01
       1.12370925e-01 -7.56605926e-01  1.28720853e-01 -1.01349493e+15
      -1.40219641e+00]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 20.0),), n_basis=21, period=20.0),
    coefficients=[[-7.79503415e-02 -5.89679082e-04 -2.66420720e-04 -3.61952375e-04
      -1.58443378e-03  7.61804146e-04 -1.53152449e-03  1.09224567e-03
      -2.97066289e-03  1.48850257e-03 -9.42854582e-05 -1.30842863e-03
       6.10766774e-04  2.70876113e-04 -1.68424390e-03 -1.31747076e-03
      -2.85573605e-03 -1.49575364e-03 -2.90301361e-04  2.79460728e+11
       2.55621243e-04]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 20.0),), n_basis=21, period=20.0),
    coefficients=[[-1.25686852e+03 -6.55712314e-01 -2.88321928e-01 -3.84227688e-01
       6.40059067e-01 -5.88227461e-01  1.12051151e-01  1.46212208e-02
       1.62893863e+00 -2.50553038e+00  1.52954829e-01 -7.76513856e-01
      -1.56806425e+00 -6.34433410e-01  1.05090220e-02 -7.32876408e-01
      -4.47222725e-01 -4.57108929e-01 -1.45505589e-01 -1.21361878e+15
      -1.76914709e+00]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 20.0),), n_basis=21, period=20.0),
    coefficients=[[-1.88092313e-01 -1.20695620e-03  1.24212074e-03 -9.20665959e-04
      -1.60190962e-03  1.48975114e-03 -7.16015079e-04 -2.93254774e-03
      -2.90414325e-03  4.92512358e-04  3.41204662e-04  3.40414078e-03
       1.80429727e-03  2.13880793e-03 -5.15746373e-04  1.22554371e-03
       8.90976768e-04 -4.46456160e-03  1.53025996e-03  1.62387763e+11
       6.75399762e-04]]) 

5.2. Coefficients visualization¶

As the result show above, the first time point is larger than others. And apart from Sample Window Sensor A (the last two points), the value at the last 4 time stamps are also significantly greater than the rest of the data.

  • Same case in both systems.
  • Same case in both sensors.

So for the convenience of visualization, we remove these points.

Sensor A¶

Cal window¶

In [29]:
coefficent_visualization(A1_cal_window_funct_reg,A2_cal_window_funct_reg,["AgeOfCardInDaysAtTimeOfTest"],range(1,36),"SensorA Cal window")

Sample window¶

In [30]:
coefficent_visualization(A1_sample_window_funct_reg,A2_sample_window_funct_reg,["AgeOfCardInDaysAtTimeOfTest"],range(1,23),"SensorA sample window")

Sensor B¶

Cal window¶

In [31]:
coefficent_visualization(B1_cal_window_funct_reg,B2_cal_window_funct_reg,["AgeOfCardInDaysAtTimeOfTest"],range(1,86),"SensorB Cal window")

Sample window¶

In [32]:
coefficent_visualization(B1_sample_window_funct_reg, B2_sample_window_funct_reg, ["AgeOfCardInDaysAtTimeOfTest"], range(1, 16), "SensorB Sample window")